Skip to content

[SPARK-33902][SQL] Support CREATE TABLE LIKE for V2#54809

Open
viirya wants to merge 7 commits intoapache:masterfrom
viirya:create-table-like-v2
Open

[SPARK-33902][SQL] Support CREATE TABLE LIKE for V2#54809
viirya wants to merge 7 commits intoapache:masterfrom
viirya:create-table-like-v2

Conversation

@viirya
Copy link
Member

@viirya viirya commented Mar 14, 2026

What changes were proposed in this pull request?

Previously, CREATE TABLE LIKE was implemented only via CreateTableLikeCommand, which bypassed the V2 catalog pipeline entirely. This meant:

  • 3-part names (catalog.namespace.table) caused a parse error
  • 2-part names targeting a V2 catalog caused NoSuchDatabaseException

This PR adds a V2 execution path for CREATE TABLE LIKE:

  • Grammar: change tableIdentifier (2-part max) to identifierReference (N-part) for both target and source, consistent with all other DDL commands
  • Parser: emit CreateTableLike (new V2 logical plan) instead of CreateTableLikeCommand directly
  • ResolveCatalogs: resolve the target UnresolvedIdentifier to ResolvedIdentifier
  • ResolveSessionCatalog: route back to CreateTableLikeCommand when both target and source are V1 tables/views in the session catalog (V1->V1 path)
  • DataSourceV2Strategy: convert CreateTableLike to new CreateTableLikeExec
  • CreateTableLikeExec: physical exec that copies schema and partitioning from the resolved source Table and calls TableCatalog.createTable()

Why are the changes needed?

CREATE TABLE LIKE was implemented solely via CreateTableLikeCommand, a V1-only command that bypasses the DataSource V2 analysis pipeline entirely. As a result, it was impossible to use CREATE TABLE LIKE to create a table in a non-session V2 catalog (e.g., testcat.dst): a 2-part name like testcat.dst was misinterpreted as database testcat in the session catalog and threw NoSuchDatabaseException, while a 3-part name like testcat.ns.dst was a parse error because the grammar only accepted 2-part tableIdentifier.

This change routes CREATE TABLE LIKE through the standard V2 DDL pipeline so that V2 catalog targets are fully supported, while preserving the existing V1 behavior when both target and source resolve to the session catalog.

Does this PR introduce any user-facing change?

Yes. CREATE TABLE LIKE DDL command supports V2.

How was this patch tested?

  • CreateTableLikeSuite: new integration tests covering V2 target with V1/V2 source, cross-catalog, views as source, IF NOT EXISTS, property behavior, and V1 fallback regression, etc.
  • DDLParserSuite: updated existing create table like test to match the new CreateTableLike plan shape; added 3-part name parsing test

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

@viirya viirya changed the title [][SQL] Support CREATE TABLE LIKE for V2 [SPARK-XXXXX][SQL] Support CREATE TABLE LIKE for V2 Mar 14, 2026
@viirya viirya changed the title [SPARK-XXXXX][SQL] Support CREATE TABLE LIKE for V2 [SPARK-55994][SQL] Support CREATE TABLE LIKE for V2 Mar 14, 2026
@viirya viirya changed the title [SPARK-55994][SQL] Support CREATE TABLE LIKE for V2 [SPARK-33902][SQL] Support CREATE TABLE LIKE for V2 Mar 15, 2026
@viirya viirya force-pushed the create-table-like-v2 branch from 638846e to 6e695fe Compare March 15, 2026 20:45
viirya and others added 7 commits March 15, 2026 18:51
## What changes were proposed in this pull request?

Previously, `CREATE TABLE LIKE` was implemented only via `CreateTableLikeCommand`,
which bypassed the V2 catalog pipeline entirely. This meant:
- 3-part names (catalog.namespace.table) caused a parse error
- 2-part names targeting a V2 catalog caused `NoSuchDatabaseException`

This PR adds a V2 execution path for `CREATE TABLE LIKE`:

- Grammar: change `tableIdentifier` (2-part max) to `identifierReference`
  (N-part) for both target and source, consistent with all other DDL commands
- Parser: emit `CreateTableLike` (new V2 logical plan) instead of
  `CreateTableLikeCommand` directly
- `ResolveCatalogs`: resolve the target `UnresolvedIdentifier` to
  `ResolvedIdentifier`
- `ResolveSessionCatalog`: route back to `CreateTableLikeCommand` when both
  target and source are V1 tables/views in the session catalog (V1->V1 path)
- `DataSourceV2Strategy`: convert `CreateTableLike` to new `CreateTableLikeExec`
- `CreateTableLikeExec`: physical exec that copies schema and partitioning from
  the resolved source `Table` and calls `TableCatalog.createTable()`

## How was this patch tested?

- `CreateTableLikeSuite`: new integration tests covering V2 target with V1/V2
  source, cross-catalog, views as source, IF NOT EXISTS, property behavior,
  and V1 fallback regression
- `DDLParserSuite`: updated existing `create table like` test to match the new
  `CreateTableLike` plan shape; added 3-part name parsing test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two tests covering the case where the source is a V2 table in a
non-session catalog and the target resolves to the session catalog.
These exercise the CreateTableLikeExec → V2SessionCatalog path and
confirm that schema and partitioning are correctly propagated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two tests to CreateTableLikeSuite documenting that pure V2 catalogs
(e.g. InMemoryCatalog) accept any provider string without validation,
while V2SessionCatalog rejects non-existent providers by delegating to
DataSource.lookupDataSource. This is consistent with how CreateTableExec
handles the USING clause for other V2 DDL commands.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…CREATE TABLE LIKE

Two new tests covering previously untested code paths in CreateTableLikeExec:
- Source provider is copied to V2 target as PROP_PROVIDER when no USING override
  is given, consistent with how CreateTableExec handles other V2 DDL.
- CHAR(n)/VARCHAR(n) types declared on a V1 source are preserved in the V2
  target via CharVarcharUtils.getRawSchema, not collapsed to StringType.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add inline comment explaining the six reasons withConstraints is
intentionally omitted: V1 behavior parity, ForeignKey cross-catalog
dangling references, constraint name collision risk, validation status
semantics on empty tables, NOT NULL already captured in nullability,
and PostgreSQL precedent (INCLUDING CONSTRAINTS opt-in). Also notes
the path forward if constraint copying is added in the future.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clarify that V1 tables (CatalogTable) have no constraint objects at all
since CHECK/PRIMARY KEY/UNIQUE/FOREIGN KEY are V2-only concepts added in
Spark 4.1.0, rather than saying CreateTableLikeCommand "never copied"
them which implies an intentional decision rather than absence of the
feature.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed identifiers

After the CREATE TABLE LIKE V2 change, the target and source identifiers
in CreateTableLikeCommand are now fully qualified (spark_catalog.default.*)
because ResolvedV1Identifier explicitly adds the catalog component via
ident.asTableIdentifier.copy(catalog = Some(catalog.name)), and
ResolvedV1TableIdentifier returns t.catalogTable.identifier which also
includes the catalog. Update the analyzer golden file accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@viirya viirya force-pushed the create-table-like-v2 branch from 6e695fe to 6e3053c Compare March 16, 2026 01:51
@aokolnychyi
Copy link
Contributor

I'll take a look later today.

// For CREATE TABLE LIKE, use the v1 command if both the target and source are in the session
// catalog (or a V1-compatible catalog extension). If source is in a different catalog, fall
// through to the V2 execution path (CreateTableLikeExec via DataSourceV2Strategy).
case CreateTableLike(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean for DSv2 connectors that override the session catalog?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example is Iceberg session catalog.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, agree, we should add a test for sessionCatalog

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. When a connector like Iceberg overrides the session catalog, the target resolves through ResolvedV1Identifier (since supportsV1Command returns true for the session catalog), but the source — a native Iceberg Table — does NOT match ResolvedV1TableOrViewIdentifier (which requires V1Table). So ResolveSessionCatalog falls through and CreateTableLikeExec handles it, passing the Iceberg Table directly. The target createTable call goes to V2SessionCatalog, which delegates to the Iceberg catalog extension. This should work, but deserves a test. I can add one if you'd like.

// CHAR/VARCHAR types are preserved as declared (without internal metadata expansion).
val columns = sourceTable match {
case v1: V1Table =>
val rawSchema = CharVarcharUtils.getRawSchema(v1.catalogTable.schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — the test "CHAR and VARCHAR types are preserved from v1 source to v2 target" in CreateTableLikeSuite covers this. It creates a V1 source with CHAR(10) and VARCHAR(20), runs CREATE TABLE testcat.dst LIKE src, and asserts schema("name").dataType === CharType(10) and schema("tag").dataType === VarcharType(20).

case class CreateTableLikeExec(
targetCatalog: TableCatalog,
targetIdent: Identifier,
sourceTable: Table,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it would only work for creating V2 table from another V2 table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this can be V1Table that wraps CatalogTable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct on both. sourceTable: Table is the V2 Table interface, which can be any implementation. For session catalog sources, ResolveRelations wraps the CatalogTable in a V1Table, which implements Table. So V1→V2 works: the source is a V1Table and we handle it explicitly in the match block at line 57 to preserve CHAR/VARCHAR types.

val partitioning = sourceTable.partitioning

// 3. Resolve provider: USING clause overrides, else copy from source.
val resolvedProvider = provider.orElse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this source provider but not target? Can we actually populate this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does DSv1 do and is it applicable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the source provider being copied to the target — which is exactly the semantics of CREATE TABLE LIKE: the target inherits the source's format unless overridden by a USING clause. This matches V1 CreateTableLikeCommand behavior, which also copies the source provider. The copied provider goes into PROP_PROVIDER in finalProps and is passed to catalog.createTable. Whether the target catalog uses it is catalog-specific: InMemoryCatalog stores it as-is; V2SessionCatalog validates it via DataSource.lookupDataSource.

locationProp

try {
// Constraints from the source table are intentionally NOT copied for several reasons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is too long to be included here, let's shorten it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'll shorten it.

@aokolnychyi
Copy link
Contributor

@gengliangwang @cloud-fan, can you folks help review as well?

@aokolnychyi
Copy link
Contributor

cc @szehon-ho as well

// If constraint copying is desired, use ALTER TABLE ADD CONSTRAINT after creation.
// If we wanted to support them in the future, the right approach would be to add an
// INCLUDING CONSTRAINTS clause (as PostgreSQL does) rather than copying blindly.
val tableInfo = new TableInfo.Builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Owner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. CatalogV2Util.convertTableProperties (used by CreateTableExec) calls withDefaultOwnership to add the current user as owner. We should do the same by adding CatalogV2Util.withDefaultOwnership(finalProps). I'll add that.

* - Source table's TBLPROPERTIES (user-specified `properties` are used instead)
* - Statistics, owner, create time
*/
case class CreateTableLikeExec(
Copy link
Contributor

@aokolnychyi aokolnychyi Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have V1 -> V2 within as well across catalog tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — "v2 target, v1 source: schema and partitioning are copied" tests V1 source (default.src in session catalog) → V2 target (testcat.dst). The "cross-catalog" and "3-part name" tests cover V2→V2 across catalogs.

@sarutak
Copy link
Member

sarutak commented Mar 16, 2026

The proposed behavior seems different from CREATE TABLE LIKE in Databricks Runtime. Is it OK?

I wonder if we can delegate what to copy on CREATE TABLE LIKE to each table format implementation?

ResolvedTable(_, _, table, _),
fileFormat: CatalogStorageFormat, provider, properties, ifNotExists) =>
CreateTableLikeExec(
catalog.asTableCatalog, ident, table, fileFormat, provider, properties, ifNotExists) :: Nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three CreateTableLike match cases (for ResolvedTable, ResolvedPersistentView, ResolvedTempView) are nearly identical. Consider consolidating into a single pattern:

    case CreateTableLike(
        ResolvedIdentifier(catalog, ident), source,
        fileFormat: CatalogStorageFormat, provider, properties, ifNotExists) =>
      val table = source match {
        case ResolvedTable(_, _, t, _) => t
        case ResolvedPersistentView(_, _, meta) => V1Table(meta)
        case ResolvedTempView(_, meta) => V1Table(meta)
      }
      CreateTableLikeExec(
        catalog.asTableCatalog, ident, table, fileFormat, provider, properties, ifNotExists) :: Nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I can refactor to the single pattern you proposed, with the source table resolved in an inner match. This is cleaner and removes the duplication. I wonder if we have precedents of DDL which consolidate both V1 and V2 commands?

targetCatalog: TableCatalog,
targetIdent: Identifier,
sourceTable: Table,
fileFormat: CatalogStorageFormat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileFormat: CatalogStorageFormat carries inputFormat/outputFormat/serde fields, but only locationUri is used (line 84). Consider narrowing the exec's parameter to location: Option[URI] to make the contract explicit, leaving the full CatalogStorageFormat only in the logical plan (where the V1 fallback path needs it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. Only locationUri is used in CreateTableLikeExec. I'll change the exec's parameter to location: Option[URI] and extract it at the DataSourceV2Strategy callsite.

val v1 = "CREATE TABLE table1 LIKE table2"
// Helper to extract fields from the new CreateTableLike unresolved plan.
// The parser now emits CreateTableLike (v2 logical plan) instead of
// CreateTableLikeCommand, so both name and source are unresolved identifiers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source is UnresolvedTableOrView, not an unresolved identifier:

Suggested change
// CreateTableLikeCommand, so both name and source are unresolved identifiers.
// CreateTableLikeCommand, so the name is an UnresolvedIdentifier and the source is an UnresolvedTableOrView.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll apply the suggestion.

@gengliangwang
Copy link
Member

BTW, consider unifying to a single CreateTableLikeExec — The current PR keeps two execution paths: V1 fallback via CreateTableLikeCommand (for V1-V1 cases) and the new CreateTableLikeExec (for V2 targets). The test "v2 source, v1 target" already proves CreateTableLikeExec works for session catalog targets via V2SessionCatalog.

* @param properties User-specified TBLPROPERTIES.
* @param ifNotExists IF NOT EXISTS flag.
*/
case class CreateTableLike(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have one single command (UnaryRunnableCommand)? I thought that's the preferred way now to reduce plan complexity in the different stages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants